Search CORE

14 research outputs found

A BERT-based dual embedding model for Chinese idiom prediction

Author: JIANG Jing
TAN Minghuan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Chinese idioms are special fixed phrases usually derived from ancient stories, whose meanings are oftentimes highly idiomatic and non-compositional. The Chinese idiom prediction task is to select the correct idiom from a set of candidate idioms given a context with a blank. We propose a BERT-based dual embedding model to encode the contextual words as well as to learn dual embeddings of the idioms. Specifically, we first match the embedding of each candidate idiom with the hidden representation corresponding to the blank in the context. We then match the embedding of each candidate idiom with the hidden representations of all the tokens in the context thorough context pooling. We further propose to use two separate idiom embeddings for the two kinds of matching. Experiments on a recently released Chinese idiom cloze test dataset show that our proposed method performs better than the existing state of the art. Ablation experiments also show that both context pooling and dual embedding contribute to the improvement of performance.Comment: COLING 202

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

A BERT-based two-stage model for Chinese Chengyu recommendation

Author: DAI Bingtian
Jing JIANG
TAN Minghuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2021
Field of study

Institutional Knowledge at Singapore Management University

Efficient organic solar cells enabled by simple non-fused electron donors with low synthetic complexity

Author: Cui Minghuan
Dong Yulian
Gao Yueyue
Lei Yong
Qin Chaochao
Qu Shengchun
Shen Zhitao
Tan Furui
Wang Zhangguo
Wang Zhijie
Zhang Weifeng
Zhao Huaping
Publication venue: 'Wiley'
Publication date: 27/11/2021
Field of study

Abstract Fused‐ring electron donors boost the efficiency of organic solar cells (OSCs), but they suffer from high cost and low yield for their large synthetic complexity (SC > 30%). Herein, the authors develop a series of simple non‐fused‐ring electron donors, PF1 and PF2, which alternately consist of furan‐3‐carboxylate and 2,2′‐bithiophene. Note that PF1 and PF2 present very small SC of 9.7% for their inexpensive raw materials, facile synthesis, and high synthetic yield. Compared to their all‐thiophene‐backbone counterpart PT‐E, two new polymers feature larger conjugated plane, resulting in higher hole mobility for them, especially a value up to ≈10 −4 cm 2 V −1 ·s for PF2 with longer alkyl side chain. Meanwhile, PF1 and PF2 exhibit larger dielectric constant and deeper electronic energy level versus PT‐E. Benefiting from the better physicochemical properties, the efficiencies of PF1‐ and PF2‐based devices are improved by ≈16.7% and ≈71.3% relative to that PT‐E‐based devices, respectively. Furthermore, the optimized PF2‐based devices with introducing PC 71 BM as the third component deliver a higher efficiency of 12.40%. The work not only indicates that furan‐3‐carboxylate is a simple yet efficient building block for constructing non‐fused‐ring polymers but also provides a promising electron donor PF2 for the low‐cost production of OSCs.A simple structure non‐fused‐ring electron donor PF2 alternately consisting of furan‐3‐carboxylate and 2,2′‐bithiophene presents very small synthetic complexity of 9.7% as well as low material cost of ≈19.0 $ g −1 . More importantly, PF2 delivers a high efficiency of 12.4% coupled with strong operational stability. imag

Digitale Bibliothek Thüringen

HiJoNLP at SemEval-2022 Task 2: Detecting Idiomaticity of Multiword Expressions using Multilingual Pretrained Language Models

Author: Tan Minghuan
Publication venue
Publication date: 26/05/2022
Field of study

This paper describes an approach to detect idiomaticity only from the contextualized representation of a MWE over multilingual pretrained language models. Our experiments find that larger models are usually more effective in idiomaticity detection. However, using a higher layer of the model may not guarantee a better performance. In multilingual scenarios, the convergence of different languages are not consistent and rich-resource languages have big advantages over other languages

arXiv.org e-Print Archive

Chinese idiom understanding with transformer-based pretrained language models

Author: TAN Minghuan
Publication venue: Singapore Management University
Publication date: 01/05/2022
Field of study

Institutional Knowledge at Singapore Management University

Learning and evaluating Chinese idiom embeddings

Author: JIANG Jing
TAN Minghuan
Publication venue: 'Assoc. for Computational Linguistics Bulgaria'
Publication date: 01/09/2021
Field of study

Institutional Knowledge at Singapore Management University

Does BERT understand idioms? A probing-based empirical study of BERT encodings of idioms

Author: JIANG Jing
TAN Minghuan
Publication venue: 'Assoc. for Computational Linguistics Bulgaria'
Publication date: 01/09/2021
Field of study

Institutional Knowledge at Singapore Management University

Exploring and adapting Chinese GPT to pinyin input method

Author: DAI Yong
FENG Zhangyin
HUANG Guoping
JIANG Jing
LI Jiwei
SHI Shuming
TAN Minghuan
TANG Duyu
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/03/2022
Field of study

While GPT has become the de-facto method for text generation tasks, its application to pinyin input method remains unexplored. In this work, we make the first exploration to leverage Chinese GPT for pinyin input method. We find that a frozen GPT achieves state-of-the-art performance on perfect pinyin. However, the performance drops dramatically when the input includes abbreviated pinyin. A reason is that an abbreviated pinyin can be mapped to many perfect pinyin, which links to even larger number of Chinese characters. We mitigate this issue with two strategies, including enriching the context with pinyin and optimizing the training process to help distinguish homophones. To further facilitate the evaluation of pinyin input method, we create a dataset consisting of 270K instances from 15 domains. Results show that our approach improves performance on abbreviated pinyin across all domains. Model analysis demonstrates that both strategies contribute to the performance boost.Comment: To appear in ACL 202

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Akt/eNOS signaling pathway mediates inhibition of endothelial progenitor cells by palmitate-induced ceramide

Author: Han Lei
Hua Zhu
Minghuan Fu
Nanzi Xie
Qing Liu
Tao Tan
Weixin Guo
Xiao-Yun X
Xiaoyun Xie
Zhihong Li
Publication venue: 'American Physiological Society'
Publication date
Field of study

Crossref